---
title: 'Retrieval-Augmented Generation'
description: 'Learn how to implement retrieval-augmented generation (RAG) to enhance AI responses with external knowledge sources'
---

## What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is a prompting technique that enhances large language model (LLM) responses by dynamically retrieving relevant information from external knowledge sources before generating a response. Rather than relying solely on the model's internal knowledge, RAG incorporates up-to-date, specific, and contextually relevant information from external databases, documents, or knowledge bases.

## Why Use Retrieval-Augmented Generation?

- **Factual Accuracy**: Access to external knowledge reduces hallucinations and factual errors
- **Up-to-Date Information**: Retrieves current information beyond the model's training data
- **Domain Specialization**: Can access domain-specific knowledge not well-represented in general LLM training
- **Knowledge Grounding**: Provides citations and sources for statements to increase trustworthiness
- **Scalable Knowledge**: Can access vast amounts of knowledge without fine-tuning the base model
- **Customizable Responses**: Tailor responses based on your specific knowledge repositories

## Basic Implementation in Latitude

Here's a simple RAG implementation using Latitude:

```markdown RAG Basic Example
---
provider: OpenAI
model: gpt-4o
temperature: 0.7
tools:
  - search_knowledge_base:
      description: Retrieve information from the knowledge base
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query to retrieve relevant information
---

# Knowledge-Enhanced Response

Answer the user's question about {{ topic }} by first retrieving relevant information.

## User Question:
{{ user_question }}

## Information Retrieval:
I'll search for the most relevant information to answer this question.

## Comprehensive Answer:
Based on the retrieved information, I'll now provide a complete and accurate answer to the question.
```

## Advanced Implementation with Multiple Sources

This example shows a more sophisticated RAG implementation that retrieves information from multiple sources and evaluates their relevance:

<CodeGroup>
```markdown Advanced RAG
---
provider: OpenAI
model: gpt-4o
temperature: 0.3
tools:
  - search_knowledge_base:
      description: Retrieve information from the primary knowledge base
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query to retrieve relevant information
  - search_recent_documents:
      description: Retrieve information from recent documents for time-sensitive information
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query to retrieve recent information
---

<step>
# Initial Information Gathering

## User Question:
{{ user_question }}

## Primary Knowledge Search:
Let me search our main knowledge base for relevant information.

## Recent Documents Search:
I'll also check recent documents for any updates or new information.
</step>

<step>
# Information Synthesis

## Retrieved Information:
Let me analyze and synthesize the information from both sources:

1. **Main Knowledge Base Findings:**
   - Key facts and concepts retrieved
   - Relevant background information

2. **Recent Documents Findings:**
   - Updates or new information
   - Changes to previously established information

## Information Evaluation:
- Relevance score of each retrieved piece
- Consistency between sources
- Recency and reliability assessment
</step>

<step>
# Final Response Generation

Based on the synthesized information, I'll now provide a comprehensive answer:

## Answer:
[Comprehensive answer to the user's question]

## Sources:
[List of sources used with citations]

## Confidence Level:
[Assessment of confidence based on quality and consistency of retrieved information]
</step>
```
</CodeGroup>

## Domain-Specific RAG Implementation

This example shows how to implement RAG for a specific domain (medical information):

<CodeGroup>
```markdown Medical RAG
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
tools:
  - search_medical_database:
      description: Retrieve information from verified medical databases
      parameters:
        type: object
        additionalProperties: false
        required: ['query', 'search_depth']
        properties:
          query:
            type: string
            description: The medical search query
          search_depth:
            type: string
            enum: ['basic', 'detailed', 'comprehensive']
            description: The depth of search to perform
---

# Medical Information Assistant

I'll answer your medical question by retrieving information from verified medical databases.

## Medical Question:
{{ medical_question }}

## Information Retrieval:
Searching medical databases for clinically validated information...

## Medical Answer:
Based on information from verified medical sources:

[Detailed answer with medical references]

## Important Notice:
This information is for educational purposes only and not a substitute for professional medical advice. Always consult a healthcare provider for medical concerns.
```
</CodeGroup>

## Best Practices for RAG

To implement retrieval-augmented generation effectively:

1. **Optimize Search Queries**
   - Extract key entities and concepts from user questions
   - Use query expansion to find related information
   - Implement query reformulation techniques

2. **Vector Database Setup**
   - Choose appropriate embedding models for your content
   - Implement chunking strategies based on content type
   - Use metadata filtering to improve retrieval precision

3. **Result Processing**
   - Rank results by relevance and recency
   - Filter out irrelevant or low-quality retrievals
   - Rerank results based on semantic similarity

4. **Source Integration**
   - Include source attribution in responses
   - Assess source credibility and prioritize reliable sources
   - Handle conflicting information from multiple sources

5. **Information Synthesis**
   - Combine information from multiple sources coherently
   - Identify and resolve contradictions
   - Maintain the context and maintain factual consistency

## Integrating RAG with the Latitude SDK

Here's how to implement RAG using the Latitude SDK with external knowledge sources:

<CodeGroup>
```typescript RAG with Pinecone
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'
import { Latitude } from '@latitude-data/sdk'

type Tools = { search_knowledge_base: { query: string } }

const PINECONE_INDEX_NAME = 'your-knowledge-index'

async function run() {
  const sdk = new Latitude(process.env.LATITUDE_API_KEY, {
    projectId: Number(process.env.PROJECT_ID),
    versionUuid: 'live',
  })
  const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
  const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY })
  const pc = pinecone.Index(PINECONE_INDEX_NAME)

  const userQuestion = "What's the latest research on quantum computing applications?"

  const result = await sdk.prompts.run<Tools>('rag/quantum-research', {
    parameters: { user_question: userQuestion },
    tools: {
      search_knowledge_base: async ({ query }) => {
        // Create embedding from query
        const embedding = await openai.embeddings
          .create({
            input: query,
            model: 'text-embedding-3-small',
          })
          .then((res) => res.data[0].embedding)

        // Search vector database
        const searchResults = await pc.query({
          vector: embedding,
          topK: 5,
          includeMetadata: true,
        })

        // Format and return results
        return searchResults.matches
          .map((match) => match.metadata?.content)
          .join('\n\n')
      },
    },
  })

  console.log('AI Response:', result.response.text)
}
```
</CodeGroup>

## Advanced RAG Techniques

### Recursive Retrieval

Implement multi-hop retrieval for complex questions:

```markdown Recursive RAG
---
provider: OpenAI
model: gpt-4o
temperature: 0.5
tools:
  - search_knowledge_base:
      description: Retrieve information from the knowledge base
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query
---

<step>
# Question Decomposition

## Original Question:
{{ complex_question }}

## Sub-questions:
Let me break this down into smaller, answerable sub-questions:
1. [First sub-question]
2. [Second sub-question]
3. [Third sub-question]
</step>

<step>
# Progressive Retrieval

I'll search for information to answer each sub-question:

## First Sub-question Search:
[Search and retrieve information]

## Second Sub-question Search:
[Search and retrieve information]

## Third Sub-question Search:
[Search and retrieve information]
</step>

<step>
# Integrated Response

Based on all the information retrieved for each sub-question:

## Comprehensive Answer:
[Integrated answer that addresses the original complex question]
</step>
```

### Hybrid Retrieval

Combine different retrieval methods for better results:

<CodeGroup>
```markdown Hybrid RAG
---
provider: OpenAI
model: gpt-4o
temperature: 0.3
tools:
  - keyword_search:
      description: Traditional keyword-based search
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
  - semantic_search:
      description: Vector-based semantic search
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
---

# Hybrid Retrieval System

## User Query:
{{ user_query }}

## Keyword Search:
I'll perform a traditional keyword search for exact matches.

## Semantic Search:
I'll also conduct a semantic search to find conceptually related information.

## Combined Results:
Analyzing and combining results from both search methods to provide the most comprehensive answer:

[Comprehensive answer combining both search approaches]
```
</CodeGroup>

## Related Techniques

Retrieval-Augmented Generation works well when combined with other prompting techniques:

1. **Chain-of-Thought with RAG**: Combine retrieved information with step-by-step reasoning for complex problem-solving.

2. **Self-Consistency and RAG**: Generate multiple RAG-enhanced responses and select the most consistent one.

3. **Few-Shot Learning with RAG**: Augment few-shot examples with retrieved information to improve performance on specialized tasks.

4. **Constitutional AI with RAG**: Use retrieved guidelines or policies to ensure AI responses comply with specific rules.

5. **Template-Based Prompting with RAG**: Use retrieved information to fill in template slots for more accurate and contextual responses.

## Real-World Applications

RAG is particularly valuable in these domains:

- **Enterprise Knowledge Management**: Access to internal documents, policies, and knowledge bases
- **Legal Research**: Retrieving relevant case law, statutes, and legal opinions
- **Medical Information Systems**: Accessing up-to-date medical research and clinical guidelines
- **Customer Support**: Retrieving product information and troubleshooting guides
- **Educational Platforms**: Providing accurate and source-backed answers to student questions
- **Financial Analysis**: Accessing market data and financial reports for informed analysis
